Generative Adversarial Networks (GAN)


By Prof. Seungchul Lee
http://iai.postech.ac.kr/
Industrial AI Lab at POSTECH

Table of Contents

Source

  • CS231n: CNN for Visual Recognition

1. Discriminative Model v.s. Generative Model

  • Discriminative model




  • Cenerative model



2. Density Function Estimation

  • Probability
  • What if $x$ is actual images in the training data? At this point, $x$ can be represented as a (for example) $64\times 64 \times 3$ dimensional vector.
    • the following images are some realizations (samples) of $64\times 64 \times 3$ dimensional space
  • Probability density function estimation problem
  • If $P_{\text{model}}(x)$ can be estimated as close to $P_{\text{data}}(x)$, then data can be generated by sampling from $P_{\text{model}}(x)$.

    • Note: Kullback–Leibler Divergence is a kind of distance measure between two distributions
  • Learn determinstic transformation via a neural network
    • Start by sampling the code vector $z$ from a simple, fixed distribution such as a uniform distribution or a standard Gaussian $\mathcal{N}(0,I)$
    • Then this code vector is passed as input to a deterministic generator network $G$, which produces an output sample $x=G(z)$
    • This is how a neural network plays in a generative model (as a nonlinear mapping to a target probability density function)



- An example of a generator network which encodes a univariate distribution with two different modes

  • Generative model of high dimensional space
  • Generative model of images
    • learn a function which maps independent, normally-distributed $z$ values to whatever latent variables might be needed to the model, and then map those latent variables to $x$ (as images)
    • first few layers to map the normally distributed $z$ to the latent values
    • then, use later layers to map those latent values to an image



3. Generative Adversarial Networks (GAN)

  • In generative modeling, we'd like to train a network that models a distribution, such as a distribution over images.

  • GANs do not work with any explicit density function !

  • Instead, take game-theoretic approach

3.1. Adversarial Nets Framework

  • One way to judge the quality of the model is to sample from it.

  • Model to produce samples which are indistinguishable from the real data, as judged by a discriminator network whose job is to tell real from fake





  • The idea behind Generative Adversarial Networks (GANs): train two different networks


  • Discriminator network: try to distinguish between real and fake data


  • Generator network: try to produce realistic-looking samples to fool the discriminator network


3.2. Objective Function of GAN

  • Think about a logistic regression classifier (or cross entropy loss $(h(x),y)$)


$$\text{loss} = -y \log h(x) - (1-y) \log (1-h(x))$$

  • To train the discriminator


  • To train the generator


3.3. Soving a MinMax Problem


Step 1: Fix $G$ and perform a gradient step to


$$\max_{D} E_{x \sim p_{\text{data}}(x)}\left[\log D(x)\right] + E_{x \sim p_{z}(z)}\left[\log (1-D(G(z)))\right]$$

Step 2: Fix $D$ and perform a gradient step to


$$\max_{G} E_{x \sim p_{z}(z)}\left[\log D(G(z))\right]$$

OR



Step 1: Fix $G$ and perform a gradient step to


$$\min_{D} E_{x \sim p_{\text{data}}(x)}\left[-\log D(x)\right] + E_{x \sim p_{z}(z)}\left[-\log (1-D(G(z)))\right]$$

Step 2: Fix $D$ and perform a gradient step to


$$\min_{G} E_{x \sim p_{z}(z)}\left[-\log D(G(z))\right]$$

4. GAN with MNIST

4.1. GAN Implementation

In [1]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
WARNING: Logging before flag parsing goes to stderr.
W0104 02:14:30.955096 49116 deprecation.py:323] From <ipython-input-1-ba3428f2355e>:6: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
W0104 02:14:30.961091 49116 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
W0104 02:14:30.962047 49116 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
W0104 02:14:31.153907 49116 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
W0104 02:14:31.155874 49116 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
W0104 02:14:31.192802 49116 deprecation.py:323] From c:\users\user\appdata\local\programs\python\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [2]:
n_D_input = ?
n_D_hidden = ?
n_D_output = ?

n_G_input = ?
n_G_hidden = ?
n_G_output = ?
In [3]:
weights = {
    'G1' : tf.Variable(tf.random_normal([n_G_input, n_G_hidden], stddev = 0.01)),
    'G2' : ?,
    'D1' : ?,
    'D2' : ?
}

biases = {
    'G1' : tf.Variable(tf.zeros([n_G_hidden])),
    'G2' : ?,
    'D1' : ?,
    'D2' : ?
}

z = tf.placeholder(?)
x = tf.placeholder(?)
In [4]:
def generator(G_input, weights, biases):
    hidden = tf.nn.relu(tf.matmul(G_input, weights['G1']) + biases['G1'])
    output = tf.nn.sigmoid(tf.matmul(hidden, weights['G2']) + biases['G2'])
    return output
In [5]:
def discriminator(D_input, weights, biases):
    hidden = tf.nn.relu(tf.matmul(D_input, weights['D1']) + biases['D1'])
    output = tf.nn.sigmoid(tf.matmul(hidden, weights['D2']) + biases['D2'])
    return output
In [6]:
def make_noise(n_batch, n_G_input):
    return np.random.normal(size = (n_batch, n_G_input))
In [7]:
G_output = generator(z, weights, biases)

D_fake = discriminator(G_output, weights, biases)
D_real = discriminator(x, weights, biases)

Step 1: Fix $G$ and perform a gradient step to

$$\min_{D} E_{x \sim p_{\text{data}}(x)}\left[-\log D(x)\right] + E_{x \sim p_{z}(z)}\left[-\log (1-D(G(z)))\right]$$

Step 2: Fix $D$ and perform a gradient step to

$$\min_{G} E_{x \sim p_{z}(z)}\left[-\log D(G(z))\right]$$
In [8]:
D_loss = tf.reduce_mean(- tf.log(D_real) - tf.log(1 - D_fake))
G_loss = tf.reduce_mean(- tf.log(D_fake))
In [9]:
D_var_list = [weights['D1'], biases['D1'], weights['D2'], biases['D2']]
G_var_list = [weights['G1'], biases['G1'], weights['G2'], biases['G2']]
In [10]:
LR = 0.0002
D_optm = tf.train.AdamOptimizer(LR).minimize(D_loss, var_list = D_var_list)
G_optm = tf.train.AdamOptimizer(LR).minimize(G_loss, var_list = G_var_list)
In [11]:
%%time
n_batch = 100
n_iter = 50000
n_prt = 5000

sess = tf.Session()
sess.run(tf.global_variables_initializer())

D_loss_record = []
G_loss_record = []
for epoch in range(n_iter):
    train_x, train_y = mnist.train.next_batch(n_batch)
    noise = make_noise(n_batch, n_G_input)

    # discriminator and generator are separately trained 
    sess.run(D_optm, feed_dict = {x: train_x, z: noise})
    sess.run(G_optm, feed_dict = {z: noise}) 
    
    if epoch % n_prt == 0:
        D_loss_val = sess.run(D_loss, feed_dict = {x: train_x, z: noise})
        G_loss_val = sess.run(G_loss, feed_dict = {z: noise})
        D_loss_record.append(D_loss_val)
        G_loss_record.append(G_loss_val)
    
        print('Epoch:', '%04d' % epoch, 'D_loss: {:.4}'.format(D_loss_val), 'G_loss: {:.4}'.format(G_loss_val))
        
        plt.figure(figsize = (10,5))
        plt.subplot(1,2,1)
        noise = make_noise(n_batch, n_G_input)
        G_img = sess.run(G_output, feed_dict = {z: noise})   
        plt.imshow(G_img[0,:].reshape(28,28), 'gray')
        plt.axis('off')
        plt.subplot(1,2,2)
        noise = make_noise(n_batch, n_G_input)
        G_img = sess.run(G_output, feed_dict = {z: noise})   
        plt.imshow(G_img[0,:].reshape(28,28), 'gray')
        plt.axis('off')
        plt.show()
Epoch: 0000 D_loss: 1.354 G_loss: 0.7273
Epoch: 5000 D_loss: 0.479 G_loss: 2.027
Epoch: 10000 D_loss: 0.3313 G_loss: 2.451
Epoch: 15000 D_loss: 0.2518 G_loss: 2.92
Epoch: 20000 D_loss: 0.3368 G_loss: 3.071
Epoch: 25000 D_loss: 0.4355 G_loss: 3.215
Epoch: 30000 D_loss: 0.4977 G_loss: 2.458
Epoch: 35000 D_loss: 0.5361 G_loss: 2.285
Epoch: 40000 D_loss: 0.5989 G_loss: 1.974
Epoch: 45000 D_loss: 0.7574 G_loss: 1.835
Wall time: 3min 51s

4.2. After Training

  • After training, use the generator network to generate new data


In [12]:
noise = make_noise(n_batch, n_G_input)
G_img = sess.run(G_output, feed_dict = {z: noise})

plt.figure(figsize = (5,5))
plt.imshow(G_img[0,:].reshape(28,28), 'gray')
plt.axis('off')
plt.show()

5. Conditional GAN

  • In an unconditioned generative model, there is no control on modes of the data being generated.
  • In the Conditional GAN (CGAN), the generator learns to generate a fake sample with a specific condition or characteristics (such as a label associated with an image or more detailed tag) rather than a generic sample from unknown noise distribution.




  • Simple modification to the original GAN framework that conditions the model on additional information for better multi-modal learning
  • Many practical applications of GANs when we have explicit supervision available
In [13]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./mnist/data/", one_hot=True)
Extracting ./mnist/data/train-images-idx3-ubyte.gz
Extracting ./mnist/data/train-labels-idx1-ubyte.gz
Extracting ./mnist/data/t10k-images-idx3-ubyte.gz
Extracting ./mnist/data/t10k-labels-idx1-ubyte.gz
In [14]:
n_D_input = 28*28
n_D_hidden = 256
n_D_output = 1

n_G_input = 128
n_G_hidden = 256
n_G_output = 28*28

n_label = 10 # one-hot-encoding
In [15]:
weights = {
    'G1' : tf.Variable(tf.random_normal([n_G_input + n_label, n_G_hidden], stddev = 0.01)),
    'G2' : tf.Variable(tf.random_normal([n_G_hidden, n_G_output], stddev = 0.01)),
    'D1' : tf.Variable(tf.random_normal([n_D_input + n_label, n_D_hidden], stddev = 0.01)),
    'D2' : tf.Variable(tf.random_normal([n_D_hidden, n_D_output], stddev = 0.01))
}

biases = {   
    'G1' : tf.Variable(tf.zeros([n_G_hidden])),
    'G2' : tf.Variable(tf.zeros([n_G_output])),
    'D1' : tf.Variable(tf.zeros([n_D_hidden])),
    'D2' : tf.Variable(tf.zeros([n_D_output]))
}

z = tf.placeholder(tf.float32, [None, n_G_input])
x = tf.placeholder(tf.float32, [None, n_D_input])
c = tf.placeholder(tf.float32, [None, n_label])
In [16]:
def generator(G_input, label, weights, biases):
    hidden = tf.nn.relu(tf.matmul(tf.concat([G_input, label], 1), weights['G1']) + biases['G1'])    
    output = tf.nn.sigmoid(tf.matmul(hidden, weights['G2']) + biases['G2'])
    return output
In [17]:
def discriminator(D_input, label, weights, biases):
    hidden = tf.nn.relu(tf.matmul(tf.concat([D_input, label], 1), weights['D1']) + biases['D1'])
    output = tf.nn.sigmoid(tf.matmul(hidden, weights['D2']) + biases['D2'])
    return output
In [18]:
def make_noise(n_batch, n_G_input):
    return np.random.normal(size = (n_batch, n_G_input))
In [19]:
G_output = generator(z, c, weights, biases)
D_fake = discriminator(G_output, c, weights, biases)
D_real = discriminator(x, c, weights, biases)

D_loss = tf.reduce_mean(-tf.log(D_real)-tf.log(1 - D_fake))
G_loss = tf.reduce_mean(-tf.log(D_fake))

D_var_list = [weights['D1'], biases['D1'], weights['D2'], biases['D2']]
G_var_list = [weights['G1'], biases['G1'], weights['G2'], biases['G2']]

LR = 0.0002
D_optm = tf.train.AdamOptimizer(LR).minimize(D_loss, var_list = D_var_list)
G_optm = tf.train.AdamOptimizer(LR).minimize(G_loss, var_list = G_var_list)
In [20]:
%%time
n_batch = 100
n_iter = 50000
n_prt = 5000

sess = tf.Session()
sess.run(tf.global_variables_initializer())

D_loss_record = []
G_loss_record = []
for epoch in range(n_iter):
    train_x, train_y = mnist.train.next_batch(n_batch)
    noise = make_noise(n_batch, n_G_input)

    # discriminator and generator are separately trained 
    sess.run(D_optm, feed_dict = {x: train_x, z: noise, c: train_y})
    sess.run(G_optm, feed_dict = {z: noise, c: train_y})

    if epoch % n_prt == 0:
        D_loss_val = sess.run(D_loss, feed_dict = {x: train_x, z: noise, c: train_y})
        G_loss_val = sess.run(G_loss, feed_dict = {z: noise, c: train_y})
        D_loss_record.append(D_loss_val)
        G_loss_record.append(G_loss_val)
        
        print('Epoch:', '%04d' % epoch, 'D_loss: {:.4}'.format(D_loss_val), 'G_loss: {:.4}'.format(G_loss_val))

        plt.figure(figsize = (5,5))
        noise = make_noise(1, n_G_input)
        _, train_y = mnist.train.next_batch(1)
        G_img = sess.run(G_output, feed_dict = {z: noise, c: train_y})   
        plt.imshow(G_img.reshape(28,28), 'gray')
        plt.axis('off')
        plt.show()
Epoch: 0000 D_loss: 1.351 G_loss: 0.7377
Epoch: 5000 D_loss: 0.2416 G_loss: 2.982
Epoch: 10000 D_loss: 0.3069 G_loss: 2.961
Epoch: 15000 D_loss: 0.5064 G_loss: 2.399
Epoch: 20000 D_loss: 0.345 G_loss: 2.652
Epoch: 25000 D_loss: 0.5571 G_loss: 2.555
Epoch: 30000 D_loss: 0.6208 G_loss: 2.048
Epoch: 35000 D_loss: 0.5439 G_loss: 2.139
Epoch: 40000 D_loss: 0.6214 G_loss: 2.09
Epoch: 45000 D_loss: 0.6606 G_loss: 2.242
Wall time: 4min 19s

Generate fake MNIST images by CGAN

In [21]:
noise = make_noise(1, n_G_input)
G_img = sess.run(G_output, feed_dict = {z: noise, c: [[0,0,0,0,0,1,0,0,0,0]]})

plt.figure(figsize = (5,5))
plt.imshow(G_img.reshape(28,28), 'gray')
plt.axis('off')
plt.show()

6. Other Tutorials

In [22]:
%%html
<center><iframe src="https://www.youtube.com/embed/9JpdAg6uMXs?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
  • CS231n: CNN for Visual Recognition
In [23]:
%%html
<center><iframe src="https://www.youtube.com/embed/5WoItGTWV54?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

MIT by Aaron Courville

In [24]:
%%html
<center><iframe src="https://www.youtube.com/embed/JVb54xhEw6Y?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>

Univ. of Wateloo By Ali Ghodsi

In [25]:
%%html
<center><iframe src="https://www.youtube.com/embed/7G4_Y5rsvi8?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [26]:
%%html
<center><iframe src="https://www.youtube.com/embed/odpjk7_tGY0?rel=0" 
width="560" height="315" frameborder="0" allowfullscreen></iframe></center>
In [27]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')